The Blocklist/Blockfile feature is really one of the most powerful new additions to Proxomitron. Actually "block list" is a bit of a misnomer though, since they can do much more than just block. In fact, they're really an extension of the normal matching expressions.
Most often you'll see them in a filter's URL match, but they're by no means limited to URLs alone. A list can contain any matching commands - even calls to other lists!
When it comes right down to it, a block list is a simply a text file containing a list of matching items. Each line in the list is checked until a match is found - otherwise the list returns false. Every block list file has a name (in the settings dialog), and can be included at any point in a matching expression by using "$LST(listname)".
You can have up to 255 different lists and use them in any way you like. Common uses could include URLs to kill, sites to accept cookies from, pages to disable or enable JavaScript on, etc. To create a new file just follow these steps...
- First, use any text editor (like notepad) to make a list. Save it as a plain somename.txt file.
- Next, add the list under Proxomitron's Blockfile tab in the config dialog. Use "ADD" to select your file then give it a name. The name you use inside Proxomitron can be different from the filename itself. This lets you swap out actual blockfiles without having to change any filters that use them.
- Finally, to use the blockfile, place a $LST(BlockfileName) in your filter at the point you would like the list items to be checked. Think of it a bit like including your list at this point.
For example, if you had a matching expression like...
(Keitarou|Naru|Suu|Mitsune|Motoko|Shinobu|Mutsumi|Kame)
You could create a list like so..
# # Sample List LoveHina.txt # Keitarou Naru Suu Mitsune Motoko Shinobu Mutsumi Kame
Then, name the list something like "LoveHina" and place "$LST(LoveHina)" in your match in place of the old "(...|...|...)" separated parenthetical group of items. As you might guess, this can be a very convenient way to deal with a large number of items. Not only are such lists easier to maintain, but the same list can be used in different filters!
Unmatching a match
You can also add "exclude" lines By prefixing a line with the '~' character. They can be used to limit what a list will match, and are only checked if a regular match is found first in the list. The list will return as true only if none of the exclude lines match. For example a list like...
# # Another sample list # an example of using `~` to exclude # *.gif ~*/gamera.gif
The first line would match anything ending in ".gif", however the second line checks to see if it also matches "/gamera.gif". Which will insure that Gamera is never caught in our list (think of it as a Turtle Excluder Device).
Obfuscating more clearly
Lists can also be called in the replacement text of a filter. Here they're not really used to match anything but instead are used to set a positional variable to some value. For instance, by using the $CON(#,#) along with the $SET(#=...) matching commands, replacement text variables can be rotated based on the connection number (like multiple User-Agents or browser versions for instance)....
# # A sample value rotation list (named "MyList") # $CON(1,3)$SET(0=Value One) $CON(2,3)$SET(0=Value Two) $CON(3,3)$SET(0=Value Three)
will place the next of the three values into \0 each time it's called. You could use this in a replacement section like so...
$LST(MyList) \0
First we call the list to set \0 then we print the value of \0 by placing it in the replacement text.
Breaking up isn't so hard to do
Normally, each line in a list is treated as an independent matching expression. However long expressions can be broken up over multiple lines by indenting the lines that follow. For example...
taste (this|that|these|those|the other thing)
could also be written as....
taste ( this| that| these| those| the other thing)
The effect is exactly the same, but you can break long lines up for easier reading - leading or trailing whitespace on each line will be ignored.
Some comments on comments
Also, as you've probably guessed from the examples, Lists can contain comments by beginning a line with '#'. Comments normally will be ignored, but the first few lines of a list are scanned for certain "keywords" which can affect how the list works. Currently there are five keywords "NoAddURL", "junkbuster", "NoHash", "NoUrlHash", and "NoPreHash".
"NoAddURL" hides the list from the "Add to blockfile" menu in the sysytem tray. It's useful to keep it from becoming cluttered by lists not used for URL matches.
"JunkBuster" if found, will cause Proxomitron to attempt to read and interpret the list as a JunkBuster style blockfile. It's probably less than perfect, but seems to work fairly well with most JunkBuster lists.
Note that due to differences in methodology, designing your own list by adding URLs as you find them will likely be more efficient. In particular, JunkBuster processes hostnames in reverse (root first). Proxomitron treats a URL the same as any random text, so you're better off not using an initial wildcard. For instance, "(www.|)somehost.com" will be much faster than "*somehost.com". If you need a leading wildcard try "[^/]++somehost.com". It's a little better than '*' since it only scans up to the first "/" in the URL.
"NoHash", "NoUrlHash", and "NoPreHash" are used to disable various hashing algorithms used in lists. NoHash eliminates all hashing and can same memory for list that are seldom called or where speed isn't an issue. "NoUrlHash" and "NoPreHash" disable particular hash types (see below). You probably shouldn't need to use these very often (if at all).
Blocklist Indexing (hashes)
Another advantage lists have over using a bunch of ORs in a match is speed. Proxomitron can do a sort of indexed hash lookup on eligible list entries. Not everything can be indexed, but for items that can, it makes finding matches in a large list much faster. Normally you don't need to worry much about how this works, but if you want to guarantee your blocklist will be as fast as possible here's some tips...
First Proxomitron checks each item in the list to see if it's "hashable" (can be indexed). If so, it's added to a hashable list - if not, it's added to a non-hashable list that must be scanned each time the list is checked. Of course, it's better to be hashable.
There's two types of indexes Proxomitron can use - a fixed prefix and a URL style index. Each item in the list is checked to see if it can be indexed using either one method or the other, if so the method that can index the most characters is used for that item. The total list may have mixture of both types.
"fixed prefix" are the simplest - they're just any expression that has a fixed number of characters without any wildcards. The longer the prefix is before any wildcard, the more indexable it becomes. Most user added URLs probably fall into this category, but it benefits many non-URL based lists too. Here's a few examples of eligible expressions...
www.somewhere.com 127.0.0. shonen(kinfe|) foo(bar|bat)*bear
ANDs are fine too as in "this*&*that" - however ORs outside of parens like "this|that" won't index since the match can begin with two different values. In this case it's better to place each on its own line.
URL style hashes as the name implies are designed mainly for lists of URLs. The goal is to allow some leading wildcards to be used, since often this is necessary for matching partial hostnames. It works by looking in the expression for the end of the hostname (marked by a ":" or "/") and indexes back from there. For it to work there must be no other wildcards between the hostname's end and the leading wildcard. Valid wildcards include "*", "\w", "[...]+", "[...]++", and "(...|)". This should cover the most useful ones for leading off URLs. Again here's some examples...
*somehost.com/(anything|after|here|is|fine)/\w.html \wsomehost.com/ [^.]+.somehost.com/ [^/]++somehost.com/ (www.|)somehost.com:[0-9]+/ ([^/]++.|)somehost.com/
Unfortunately things like...
([^/]++.|)somehost.*/ ([^/]++.|)somehost.(com|net)/
won't be indexable. In these cases it may actually be faster to write them as two entries...
([^/]++.|)somehost.com/ ([^/]++.|)somehost.net/
One change blocklist authors should take note of is, when using a leading wildcard, it's very important to use the full hostname including the "/". Previously this wasn't necessary, so an entry like...
([^/]++.|)microsoft.
would be better written as...
([^/]++.|)microsoft.com/
or perhaps multiple entries if necessary. It also means there's less of a need to write expressions like...
www.(ad(server|engine|banner)|banner(site|click|)).(com|net)
Instead, listing all the actual hosts to be matched will be faster - not to mention easier to maintain.
Limitations...
Blocklist do have some limitations that make them work a bit different from a list of OR seperated items. Mainly they operate in their own scope. For example, say you have a match like...
www.$LST(hosts).comand a list like...
# # Host list # adsite adsite-2Given the host "www.adsite-2.com" would not match even though it looks like it should. This is because the list first finds a match with "adsite" but, being in it's own scope, can't look beyond to see that it also needs ".com". To avoid this it's important to make matches unambiguous. for example, by moving the trailing "." to the list like so...
www.$LST(hosts)com
# # Host list # adsite. adsite-2.
Also be aware that a "*" at the end of a list item will match all the way to the end of the available characters. Normally you don't want this, instead place them in the calling match...
www.$LST(hosts)*.com
The end...
Well that should be about all there is to know about blocklists. If you've made it this far I officially award you a C.P.B.F.E. (Proxomitron Certified Blocklist (or Blockfile) Engineer). Now, won't that look good on your next Resume!
Return to main index